Introduction

Why was this show chosen? Due to all artists being established, they have a fanbase, which could play into the results. On survival shows where aspiring trainees are trying to debut, many trainees are unknown which means that they don’t have any influence from before the show starting.

The format of the show also makes it better than other shows for analysis, due to the outcomes of the contestants being evenly divided, with 7 people eliminated or debuting during each phase. The question this analysis asks how does the various factors of a group/artist affect their placement on Queendom Puzzle?

Progression Guide

Importing

library(ggplot2)
library(readxl)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidymodels)
## ── Attaching packages ────────────────────────────────────── tidymodels 1.1.1 ──
## ✔ broom        1.0.5     ✔ rsample      1.2.0
## ✔ dials        1.2.0     ✔ tibble       3.2.1
## ✔ infer        1.0.5     ✔ tidyr        1.3.0
## ✔ modeldata    1.2.0     ✔ tune         1.1.2
## ✔ parsnip      1.1.1     ✔ workflows    1.1.3
## ✔ purrr        1.0.2     ✔ workflowsets 1.0.1
## ✔ recipes      1.0.9     ✔ yardstick    1.2.0
## Warning: package 'recipes' was built under R version 4.3.0
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ purrr::discard() masks scales::discard()
## ✖ dplyr::filter()  masks stats::filter()
## ✖ dplyr::lag()     masks stats::lag()
## ✖ recipes::step()  masks stats::step()
## • Learn how to get started at https://www.tidymodels.org/start/
QP_data <- read_excel("QP-Ranking.xlsx")
QP_Ranking <- as_tibble(QP_data)

head(QP_Ranking)
## # A tibble: 6 × 12
##   episode initial_rank name    ranking signal_song     initial_group remix_song 
##     <dbl>        <dbl> <chr>     <dbl> <chr>           <chr>         <chr>      
## 1       1            4 Nana         22 PICK on the top Woo!ah!       Nxde       
## 2       1            1 Yeoreum       1 PICK-CAT        WJSN          Don't Call…
## 3       1            4 Hwiseo       22 Athena          H1-KEY        Shut Down  
## 4       1            1 Kei           1 Athena          Lovelyz       Only One   
## 5       1            2 Yeeun         8 PICK-CAT        CLC           Don't Call…
## 6       1            2 Jihan         8 PICK on the top Weeekly       Nxde       
## # ℹ 5 more variables: all_rounder_team <chr>, vocal_rap_song <chr>,
## #   dance_song <chr>, semi_final_song <chr>, finale_song <chr>

El7Z UP Members Progression

QP_Ranking_EL7ZUP <- QP_Ranking %>%
  filter(name == "Hwiseo" | name == "Nana" | name == "Yuki" | name == "Kei" | name == "Yeoreum" | name == "Yeonhee" | name == "Yeeun")

gfg_plot <- ggplot(QP_Ranking_EL7ZUP, aes(x=episode, y=ranking, group=name, color=name)) +
  geom_line() +
  geom_point() +
  xlab("Episode Number") +
  ylab("Rank") +
  scale_x_continuous(breaks=1:10) +
  scale_y_continuous(trans="reverse", breaks=1:28) +
  facet_wrap(~name) +
  scale_colour_discrete(name="Name")

gfg_plot

Entire Cast Progression

gfg_plot2 <- ggplot(QP_Ranking, aes(x=episode, y=ranking, group=name, color=name)) +
  geom_line() +
  geom_point() +
  xlab("Episode Number") +
  ylab("Rank") +
  scale_x_continuous(breaks=1:10) +
  scale_y_continuous(trans="reverse", breaks=1:28) +
  facet_wrap(~name) +
  scale_colour_discrete(name="Name")

gfg_plot2

Grouped Dependent on their Signal Song Group

  • Note: Despite leaving before the performance, Chaeyeon and Haein are credited as being members of PICK on the top and DROP The Beat respectively.
gfg_plot3 <- ggplot(QP_Ranking, aes(x=episode, y=ranking, group=name, color=signal_song)) +
  geom_line() +
  geom_point() +
  xlab("Episode Number") +
  ylab("Rank") +
  scale_x_continuous(breaks=1:10) +
  scale_y_continuous(trans="reverse", breaks=1:28) +
  facet_wrap(~signal_song) +
  scale_colour_discrete(name="Signal Song")

gfg_plot3

Grouped Dependent on their Initial Groups

  • Note: Miru and Fye are credited as being from NMB48 and BNK48 respectively.
gfg_plot4 <- ggplot(QP_Ranking, aes(x=episode, y=ranking, group=name, color=initial_group)) +
  geom_line() +
  geom_point() +
  xlab("Episode Number") +
  ylab("Rank") +
  scale_x_continuous(breaks=1:10) +
  scale_y_continuous(trans="reverse", breaks=1:28) +
  facet_wrap(~initial_group) +
  scale_colour_discrete(name="Affiliation")

gfg_plot4

Grouped Dependent on their Remix Battle Songs

QP_Ranking_Remix <- filter(QP_Ranking, remix_song != 'NA')

gfg_plot5 <- ggplot(QP_Ranking_Remix, aes(x=episode, y=ranking, group=name, color=remix_song)) +
  geom_line() +
  geom_point() +
  xlab("Episode Number") +
  ylab("Rank") +
  scale_x_continuous(breaks=1:10) +
  scale_y_continuous(trans="reverse", breaks=1:28) +
  facet_wrap(~remix_song) +
  scale_colour_discrete(name="Remix")

gfg_plot5

Creating a model

To analyze popularity, I will use the ranks given on the first episode of the show that were based on their prior career experience in the industry. This is a fair metric of popularity to compare, except for the fact that Fye and Miru are ranked low as they have no K-pop experience, despite having a fanbase outside of it which could influence the votes.

summarized_data <- QP_Ranking %>%
                   group_by(name)
summarized_split <- initial_split(summarized_data, prop = 0.75, strata = initial_rank)

summarized_training <- training(summarized_split)
summarized_testing <- testing(summarized_split)